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Leptonema illini Hovind-Hougen 1979 is the type species of the genus Leptonema, family 
Leptospiraceae, phylum Spirochaetes. Organisms of this family have a Gram-negative-like cell enve- 
lope consisting of a cytoplasmic membrane and an outer membrane. The peptidoglycan layer is as- 
sociated with the cytoplasmic rather than the outer membrane. The two flagella of members of 
Leptospiraceae extend from the cytoplasmic membrane at the ends of the bacteria into the 
periplasmic space and are necessary for their motility. Here we describe the features of the L. illini 
type strain, together with the complete genome sequence, and annotation. This is the first genome 
sequence (finished at the level of Improved High Quality Draft) to be reported from of a member of 
the genus Leptonema and a representative of the third genus of the family Leptospiraceae for which 
complete or draft genome sequences are now available. The three scaffolds of the 4,522,760 bp draft 
genome sequence reported here, and its 4,230 protein-coding and 47 RNA genes are part of the Ge- 
nomic Encyclopedia of Bacteria and Archaea project. 



Introduction 



Strain 3055T was isolated from urine of a clinically 
healthy bull [1], and was first mentioned in the 
literature as a new Leptospira serotype, serovar 
illini [2,3], but as no name was proposed, it was not 
validly published. This occurred in the comparative 
study of Hovind-Hougen [4] who found morpholog- 
ical differences between 'Leptospira illini' strain 
3005 and other members of Leptospira, i.e. the 
presence of cytoplasmatic tubules and the struc- 
ture of the basal complex of the flagella. These dif- 
ferences, together with the finding of a higher DNA 



base composition and growth behavior [5] were 
used as criteria to taxonomically separate strain 
3055 from Leptospira as Leptonema illini with 
strain 3055T [= DSM 21528 = NCTC 11301] as the 
type strain. This species is the only species of the 
genus. The family Leptospiraceae was created in the 
same publication [4], although the name was pro- 
posed before, though not effectively published [J 
Pilot, Ph D Thesis, University of Paris, Paris, France 
1965]. Despite a description in the International 
Journal of Systematic Bacteriology the name 
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Leptonema was not included in the Approved List 
of Bacterial Names [6]. The omission of this name 
was not in accordance with the Bacteriological Code 
[1990 Revision) Rule 24a, Note 1, but was correct- 
ed in Validation List N" 10 [7]. 

The phylogenetic relatedness among spirochetes 
and the isolated position of L. illini was first eluci- 
dated by 16S rRNA cataloguing [8] and then by 
comparative sequence analysis of reverse- 
transcribed 16S rRNA sequences [9] and by rDNA 
analyses [10,11]. The moderate similarity values 
between L. illini and strains of Leptospira were later 
supported by the absence of significant DNA-DNA 
hybridization values between members of the two 
genera [12-14], 16S rRNA restriction fragment 
analysis [15] and PGR amplification of the 16S-23S 
ribosomal DNA spacer [16]. Application of a 16S 
rRNA gene real-time PGR assay to leptospiras [17] 
confirmed the presence of L. illini strains in kidneys 
of Indian rats and bandicoots. Here we present a 
summary classification and a set of features for L. 
illini strain 3055T together with the description of 
the complete genomic sequencing and annotation. 
The rationale for sequencing the genome of this 
non-pathogenic strain is based on its isolated posi- 
tion within the phylum Spirochaetes. 

Classification and features 
1 6S rRNA gene sequence analysis 

The single genomic 16S rRNA gene sequence of L. 
illini 3055T was compared using NGBI BLAST 
[18,19] under default settings (e.g., considering 
only the high-scoring segment pairs [HSPs) from 
the best 250 hits) with the most recent release of 
the Greengenes database [20] and the relative fre- 
quencies of taxa and keywords (reduced to their 
stem [21]) were determined, weighted by BLAST 
scores. The most frequently occurring genera were 
Leptospira (53.4%), Anaeromyxobacter (31.6%), 
Leptonema (11.5%), Tumeriella (1.3%) and 
Desulfomonile (0.8%) (96 hits in total). Regarding 
the three hits to sequences from members of the 
species, the average identity within HSPs was 
99.7%, whereas the average coverage by HSPs was 
97.4%. Among all other species, the one yielding 
the highest score was Leptospira wolbachii 
(AY631890), which corresponded to an identity of 
86.4% and an HSP coverage of 76.8%. (Note that 
the Greengenes database uses the INSDG (= 
EMBL/NGBI/DDBJ) annotation, which is not an 
authoritative source for nomenclature or classifica- 
tion.) The highest-scoring environmental sequence 



was EF648066 (Greengenes short name 'dynamics 
during produced water treatment aerobic activated 
sludge clone HB63'), which showed an identity of 
99.2% and an HSP coverage of 98.4%. The most 
frequently occurring keywords within the labels of 
all environmental samples which yielded hits were 
'microbi' (5.2%), 'soil' (2.3%), 'anaerob' (2.3%)), 
'industri' (2.0%) and 'ecolog' (1.4%) (154 hits in 
total). The most frequently occurring keywords 
within the labels of those environmental samples 
which yielded hits of a higher score than the high- 
est scoring species were 'microbi' (4.5%), 'cell' 
(3.1%), 'prmr' (3.0%), 'sediment' (3.0%) and 'coral' 
(3.0%) (12 hits in total). None of these keywords 
provides useful information about the close rela- 
tives of strain 3055^ in the environment. 

Figure 1 shows the phylogenetic neighborhood of L. 
illini in a 16S rRNA based tree. The sequence of the 
single 16S rRNA gene copy in the genome does not 
differ from the previously published 16S rRNA se- 
quence (AY714984). 

Morphology and physiology 

The unicellular cells of strain 30551" stain Gram 
negatively and are of hehcal shape (13-21 |im long 
and 0.1 |im wide) [4] [Figure 2]. Most cells have 
hook-shaped ends and display a typical leptospiral 
morphology [46]. The wavelength of the coils 
within the helix is about 0.6 \ym. with an amplitude 
of about 0.1 ^m. A single flagellum is inserted at 
each pole and in well-preserved cells the flagellum 
is entwined with the helical body within the 
periplasmatic cell for about four to six turns of the 
helix (not visible in Figure 2). Rotation of the fla- 
gella by a flagellar motor induces changes in the 
cell morphology that drives motility [47]. In cells 
treated with Myxobacter Al-1 protease [48] bun- 
dles of three to four cytoplasmic tubules are ob- 
served which originate close to the insertion point 
of each of the two flagella. The bundles are located 
close to the inner site of the cytoplasmic mem- 
brane just underneath the flagellum. As bundles 
and flagella are shorter than the total length of the 
cell, the middle part is devoid of both. Flagella, 
released by the AL-1 protease, are often found as 
spirals. Each flagellum consists of a core (diameter 
10 nm), covered by a sheath (diameter 16 nm). 
One of the arguments to classify strain 3055 as the 
type of a new genus was the structure of the inser- 
tion part of the flagellum, similar to those of Gram- 
positive bacteria in L. illini while other leptospiras 
possess the Gram-negative type insertion [4]. 
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] Borrelia spp. (15 sequences) 



Spirochaeta aurantia (M57740) 

Spirochaeta litoralis (FR733665) 

Spirochaeta perfilievii (AY337318) 

Spirochaeta isovalerica (M88720) 

Spirochaeta ceiiobiosiphila (EU448140) ' 
Spirochaeta americana (AF373921) 
Spirochaeta ail^aiica (X93927) ' 
— Spirochaeta halophila (M88722) 
r Spirochaeta asiatica (X93926) 



81/- 



- Spirochaeta dissipatitropha (AY995150) 
_| Spirochaeta bajacaliforniensis (AJ698859) ' 



- Spirochaeta smaragdinae (U80597) ' 



- Spirochaeta thermophiia (FR749903) ' 



Sphaerochaeta globosa (AF357916) 
— Sphaerochaeta pieomorpha (AF357917) ' 
- Sphaerochaeta coccoides (AJ698092) 



I Treponema spp. (22 sequences) 



Leptospira santarosai (AY631883) * 
Leptospira weiiii (AY631877) 
Leptospira borgpetersenii (AY887899) 
ILeptospira alexanderi (AY631880) ' 
f Leptospira interrogans (Z12817) 
Leptospira l<irschneh (AY631895) ' 
Leptospira noguchii (AY631886) * 
Leptospira l<metyi (AB279549) ' 
94/95„ ieptospira iicerasiae (EF612284) * 
Leptospira wolffii (EF025496) ' 
f Leptospira broomii (AY796065) ' 
Leptospira fainei (AY631885) ' 
Leptospira inadai (Z21634) ' 
10 I Leptospira biflexa (A Y631876) " 

I Leptospira meyeri (AY631878) 

L Leptospira wolbachii (AY631879) 
Leptonema iilini (IMG2506860951) * 
— Turnerielia parva (AY293856) " 

Brachyspira spp. (7 sequences) 



- Exilispira thermophiia (AB364473) 



- Brevinema andersonii (GU993264) 



Figure 1. Phylogenetic tree highlighting the position of L. iilini relative to the type strains of the other species within the 
phylum Spiiochaetes. The tree was inferred from 1,325 aligned characters [22,23] of the 16S rRNA gene sequence under 
the maximum likelihood (ML) criterion [24]. Rooting was done initially using the midpoint method [25] and then 
checked for its agreement with the current classification (Table 1). The branches are scaled in terms of the expected 
number of substitutions per site. Numbers adjacent to the branches are support values from 550 ML bootstrap replicates 
[26] (left) and from 1,000 maximum-parsimony bootstrap replicates 127] (right) if larger than 60%. Lineages with type 
strain genome sequencing projects registered in GOLD [28] are labeled with one asterisk. Those also listed as 'Complete 
and Published' with two asterisks (see [29-35] and CP003155 for Sphaerochaeta pieomorpha, CP002903 for Spliochaeta 
thermophiia, CP002696 for Treponema brennaborense, CP001841 for 7. azotonutricium). The collapsed Treponema 
subtree contains three species formerly assigned to Spirochaeta that have recently been included in the genus 
Treponema, even though those names are not yet validly published [34]. 



Serum and long-chain fatty acids are required for 
growth, no serum is required in trypticase soy 
broth. The organism is chemoorganotrophic and 
aerobic. Long-chain fatty acids (>14 carbons] are 
used as source of carbon and energy. Ammonia, in 
the form of inorganic salts rather than amino acids 
is used as a nitrogen source. Purines, but not 
pyrimidines, are utilized. Strain SOOSt is non- 
pathogenic for hamsters, mice, gerbils, guinea pigs 
and cattle [15], although it may cause opportunis- 
tic infections, as it has been isolated from the 
blood of a HIV-infected patient [43]. 

Chemotaxonomy 

No data are available for fatty acids, quinones or 
polar Upids. The G+C content of the DNA was pre- 
viously reported with 51-53 mol% [49], which is 



below the value inferred from the genome se- 
quence [see genome statistics table]. 

Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its phylogenetic position [50], and is part 
of the Genomic Encyclopedia of Bacteria and 
Archaea project [51]. The genome project is de- 
posited in the Genomes OnLine Database [28] and 
the complete genome sequence is deposited in 
GenBank. Sequencing, finishing and annotation 
were performed by the DOE Joint Genome Insti- 
tute [JGI] using state of the art sequencing tech- 
nology [52]. A summary of the project information 
is shown in Table 2. 



http://standardsingenomics.org 



179 



Leptonema illini type strain (3055T) 



Table 1. Classification and general features of L. illini 3055^ according to the MIGS recommendations [36] , 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS [37] 






Phylum Spirochaetes 


TAS [38] 






Class Spirochaetes 


TAS [39,40] 




Current classification 


Order Spirochaetales 


TAS [41,42] 






Family Leptospiraceae 


TAS [4,14,42] 






Genus Leptonema 


TAS [4,7] 






Species Leptonema illini 


TAS [4,7] 


MIGS-7 


Subspecific genetic lineage (strain) 


3055'^ 


TAS [4] 


MIGS-12 


Reference for biomaterial 


Hovind-Hougen, 1979 


TAS [4] 




Gram stain 


negative 


TAS [4] 




Cell shape 


helical rods 


TAS [4] 




Motility 


motile 


TAS [4] 




Sporulation 


non-sporulating 


TAS [4 




Temperature range 


mesophile 


TAS [4] 




Optimum temperature 


29° C 


TAS [4] 




Salinity 


not reported 




MIGS-22 


Relationship to oxygen 


aerobe 


TAS [4] 




Carbon source 


long-chain fatty acids 


TAS [4] 




Energy metabolism 


chemoorganotroph 


TAS [4] 


MIGS-6 


Habitat 


not specified 




MIGS-6.2 


pH 


not reported 




MIGS-15 


Biotic relationship 


free living 


TAS [4] 


MIGS-14 


Known pathogenicity 


opportunistic infections 


TAS [43] 


MIGS-16 


Specific host 


Bos taurus (cow) 


TAS [4] 


MIGS-18 


Health status of host 


healthy 


TAS [4[ 




Biosafety level 


1 


TAS [44] 


MIGS-19 


Trophic level 


not reported 




M!GS-23.1 


Isolation 


urine of a bull 


TAS [4[ 


MIGS-4 


Geographic location 


Iowa 


TAS [5] 


MIGS-5 


Time of sample collection 


1965 


TAS [1] 


MIGS-4.1 


Latitude 


not reported 




MIGS-4.2 


Longitude 


not reported 




MIGS-4.3 


Depth 


not reported 




MIGS-4.4 


Altitude 


not reported 





Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: 

Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on 
a generally accepted property for the species, or anecdotal evidence). Evidence codes are from the Gene 
Ontology project [45]. 
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Figure 2. Scanning electron micrograph of L. illini 3055^ 
Table 2. Genome sequencing project information 



MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Improved high quality draft 


MIGS-28 


Libraries used 


Three genomic libraries: one 454 pyrosequence standard library, 
two 454 PE library (1 3 kb insert size), one lllumina library 


MIGS-29 


Sequencing platforms 


lllumina GAii, 454 GS FLX Titanium 


MIGS-31. 2 


Sequencing coverage 


1,276.9 X lllumina; 35.5 x pyrosequence 


MIGS-30 


Assemblers 


New/bler version 2.3, Velvet 1 .0.1 3, phrap version SPS - 4.24 


MIGS-32 


Gene calling method 


Prodigal 1.4, GenePRIMP 




INSDC ID 


AHKTOOOOOOOO 




Gen Bank Date of Release 


January 24, 2012 




GOLD ID 


Gi04604 




NCBI project ID 


60435 




Database: IMG 


2506783010 


MIGS-13 


Source material identifier 


DSM 21528 




Project relevance 


Tree of Life, GEBA 



Growth conditions and DNA isolation 

L. illini strain SOSS^, DSM 21528, was grown in 
DSMZ medium 1113 [Leptospira Medium] at 30°C. 
DNA was isolated from 1-1.5 g of cell paste using 
MasterPure Gram-positive DNA purification kit 
(Epicentre MGP04100] following the standard 
protocol as recommended by the manufacturer 
with modification st/DL for cell lysis as described 
in Wu et al. 2009 [51]. DNA is available through 
the DNA Bank Network [53]. 



Genome sequencing and assembly 

The genome was sequenced using a combination of 
lllumina and 454 sequencing platforms. All general 
aspects of library construction and sequencing can 
be found at the JGI website [54]. Pyrosequencing 
reads were assembled using the Newbler assembler 
[Roche]. The initial Newbler assembly consisting of 
140 contigs in tree scaffolds was converted into a 
phrap [55] assembly by making fake reads from the 
consensus, to collect the read pairs in the 454 paired 
end library. lllumina GAii sequencing data [5,940 
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Mb] was assembled with Velvet [56] and the con- 
sensus sequences were shredded into 1.5 kb over- 
lapped fake reads and assembled together with the 
454 data. The 454 draft assembly was based on 179 
Mb 454 draft data and all of the 454 paired end data. 
Newbler parameters are -consed -a 50 -1 350 -g -m - 
ml 20. The Phred/Phrap/Consed software package 
[55] was used for sequence assembly and quality 
assessment in the subsequent finishing process. 
After the shotgun stage, reads were assembled with 
parallel phrap (High Performance Software, LLC). 
Possible mis-assemblies were corrected with 
gapResolution [54], Dupfinisher [57], or sequencing 
cloned bridging PGR fi^agments with subcloning. 
Gaps between contigs were closed by editing in 
Gonsed, by PGR and by Bubble PGR primer walks Q.- 
F. Chang, unpublished). A total of 103 additional 
reactions and one shatter library were necessary to 
close gaps and to raise the quality of the finished 
sequence. Illumina reads were also used to correct 
potential base errors and increase consensus quality 
using a software Polisher developed at JGI [58]. The 
error rate of the completed genome sequence is less 
than 1 in 100,000. Together, the combination of the 
Illumina and 454 sequencing platforms provided 
1,312.4 X coverage of the genome. The final assem- 
bly contained 488,975 pyrosequence and 
75,603,747 Illumina reads. 



Genome annotation 

Genes were identified using Prodigal [59] as part of 
the DOE-JGI genome annotation pipeline [60], fol- 
lowed by a round of manual curation using the JGI 
GenePRIMP pipeline [61]. The predicted CDSs were 
translated and used to search the National Center for 
Biotechnology Information (NGBI) non-redundant 
database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, 
COG, and InterPro databases. Additional gene pre- 
diction analysis and functional annotation was per- 
formed within the Integrated Microbial Genomes - 
Expert Review (IMG-ER) platform [62]. 

Genome properties 

The genome statistics are provided in Table 3 and 
Figure 3. The assembly of the draft genome se- 
quence consists of three scaffolds with 4,325,094 
bp, 184,087 bp and 13,579 bp length, respectively, 
and a G+C content of 54.3%. Of the 4,277 genes 
predicted, 4,230 were protein-coding genes, and 
47 RNAs; 69 pseudogenes were also identified. 
The majority of the protein-coding genes [60.3%) 
were assigned a putative function while the re- 
maining ones were annotated as hypothetical pro- 
teins. The distribution of genes into COGs func- 
tional categories is presented in Table 4. 



Table 3. Genome statistics 



Attribute 


Value 


% of Total 


Genome size (bp) 


4,522,760 


100.00 


DNA coding region (bp) 


4,079,818 


90.21 


DNA G+C content (bp) 


2,453,341 


54.26 


Number of scaffolds 


3 




Extrachromosomal elements 


unknown 




Total genes 


4,277 


100.00 


RNA genes 


47 


1.10 


rRNA operons 


1 




tRNA genes 


41 


0.96 


Protein-coding genes 


4,230 


98.90 


Pseudo genes 


69 


1.61 


Genes with function prediction 


2,579 


60.30 


Genes in paralog clusters 


1,764 


41.24 


Genes assigned to COGs 


2,805 


65.58 


Genes assigned Pfam domains 


2,865 


66.99 


Genes with signal peptides 


1,481 


34.63 


Genes with transmembrane helices 


1,089 


25.46 


CRISPR repeats 


0 
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IP 





Figure 3. Graphical map of the largest scaffold. From bottom to the top: Genes on forward strand (color by 
COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, 
other RNAs black), GC content, GC skew (purple/olive). 



Table 4. Number of genes associated with the general COG functional categories 
Code Value %age Description 



J 


156 


5.0 


Translation, ribosomal structure and biogenesis 


A 


0 


0.0 


RNA processing and modification 


K 


201 


6.5 


Transcription 


L 


194 


6.3 


Replication, recombination and repair 


B 


4 


0.1 


Chromatin structure and dynamics 


D 


34 


1.1 


Cell cycle control, cell division, chromosome partitioning 


Y 


0 


0.0 


Nuclear structure 


V 


61 


2.0 


Defense mechanisms 


T 


303 


9.8 


Signal transduction mechanisms 


M 


226 


7.3 


Cell wall/membrane/envelope biogenesis 


N 


108 


3.5 


Cell motility 


Z 


0 


0.0 


Cytoskeleton 


W 


0 


0.0 


Extracellular structures 


u 


74 


2.4 


Intracellular trafficking, secretion, and vesicular transport 


o 


119 


3.8 


Posttranslational modification, protein turnover, chaperones 


C 


160 


5.2 


Energy production and conversion 


G 


111 


3.6 


Carbohydrate transport and metabolism 


E 


189 


6.1 


Amino acid transport and metabolism 


F 


60 


1.9 


Nucleotide transport and metabolism 


H 


139 


4.5 


Coenzyme transport and metabolism 


1 


131 


4.2 


Lipid transport and metabolism 


P 


128 


4.1 


Inorganic ion transport and metabolism 


Q 


43 


1.4 


Secondary metabolites biosynthesis, transport and catabolism 


R 


401 


12.9 


General function prediction only 


S 


260 


8.4 


Function unknown 




1,472 


34.4 


Not in COGs 
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